Artificial Intelligence
♻ ☆ Coarse-Tuning for Ad-hoc Document Retrieval Using Pre-trained Language Models LREC
Fine-tuning in information retrieval systems using pre-trained language
models (PLM-based IR) requires learning query representations and
query-document relations, in addition to downstream task-specific learning.
This study introduces coarse-tuning as an intermediate learning stage that
bridges pre-training and fine-tuning. By learning query representations and
query-document relations in coarse-tuning, we aim to reduce the load of
fine-tuning and improve the learning effect of downstream IR tasks. We propose
Query-Document Pair Prediction (QDPP) for coarse-tuning, which predicts the
appropriateness of query-document pairs. Evaluation experiments show that the
proposed method significantly improves MRR and/or nDCG@5 in four ad-hoc
document retrieval datasets. Furthermore, the results of the query prediction
task suggested that coarse-tuning facilitated learning of query representation
and query-document relations.
comment: Accepted at LREC-COLING 2024
♻ ☆ Imitating Cost-Constrained Behaviors in Reinforcement Learning ICAPS-24
Complex planning and scheduling problems have long been solved using various
optimization or heuristic approaches. In recent years, imitation learning that
aims to learn from expert demonstrations has been proposed as a viable
alternative to solving these problems. Generally speaking, imitation learning
is designed to learn either the reward (or preference) model or directly the
behavioral policy by observing the behavior of an expert. Existing work in
imitation learning and inverse reinforcement learning has focused on imitation
primarily in unconstrained settings (e.g., no limit on fuel consumed by the
vehicle). However, in many real-world domains, the behavior of an expert is
governed not only by reward (or preference) but also by constraints. For
instance, decisions on self-driving delivery vehicles are dependent not only on
the route preferences/rewards (depending on past demand data) but also on the
fuel in the vehicle and the time available. In such problems, imitation
learning is challenging as decisions are not only dictated by the reward model
but are also dependent on a cost-constrained model. In this paper, we provide
multiple methods that match expert distributions in the presence of trajectory
cost constraints through (a) Lagrangian-based method; (b) Meta-gradients to
find a good trade-off between expected return and minimizing constraint
violation; and (c) Cost-violation-based alternating gradient. We empirically
show that leading imitation learning approaches imitate cost-constrained
behaviors poorly and our meta-gradient-based approach achieves the best
performance.
comment: Accepted to the 34th International Conference on Automated Planning
and Scheduling (ICAPS-24)
♻ ☆ Re2LLM: Reflective Reinforcement Large Language Model for Session-based Recommendation
Large Language Models (LLMs) are emerging as promising approaches to enhance
session-based recommendation (SBR), where both prompt-based and
fine-tuning-based methods have been widely investigated to align LLMs with SBR.
However, the former methods struggle with optimal prompts to elicit the correct
reasoning of LLMs due to the lack of task-specific feedback, leading to
unsatisfactory recommendations. Although the latter methods attempt to
fine-tune LLMs with domain-specific knowledge, they face limitations such as
high computational costs and reliance on open-source backbones. To address such
issues, we propose a Reflective Reinforcement Large Language Model (Re2LLM) for
SBR, guiding LLMs to focus on specialized knowledge essential for more accurate
recommendations effectively and efficiently. In particular, we first design the
Reflective Exploration Module to effectively extract knowledge that is readily
understandable and digestible by LLMs. To be specific, we direct LLMs to
examine recommendation errors through self-reflection and construct a knowledge
base (KB) comprising hints capable of rectifying these errors. To efficiently
elicit the correct reasoning of LLMs, we further devise the Reinforcement
Utilization Module to train a lightweight retrieval agent. It learns to select
hints from the constructed KB based on the task-specific feedback, where the
hints can serve as guidance to help correct LLMs reasoning for better
recommendations. Extensive experiments on multiple real-world datasets
demonstrate that our method consistently outperforms state-of-the-art methods.
comment: 11 pages, 4 figures
♻ ☆ MA4DIV: Multi-Agent Reinforcement Learning for Search Result Diversification
Yiqun Chen, Jiaxin Mao, Yi Zhang, Dehong Ma, Long Xia, Jun Fan, Daiting Shi, Zhicong Cheng, Simiu Gu, Dawei Yin
The objective of search result diversification (SRD) is to ensure that
selected documents cover as many different subtopics as possible. Existing
methods primarily utilize a paradigm of "greedy selection", i.e., selecting one
document with the highest diversity score at a time. These approaches tend to
be inefficient and are easily trapped in a suboptimal state. In addition, some
other methods aim to approximately optimize the diversity metric, such as
$\alpha$-NDCG, but the results still remain suboptimal. To address these
challenges, we introduce Multi-Agent reinforcement learning (MARL) for search
result DIVersity, which called MA4DIV. In this approach, each document is an
agent and the search result diversification is modeled as a cooperative task
among multiple agents. This approach allows for directly optimizing the
diversity metrics, such as $\alpha$-NDCG, while achieving high training
efficiency. We conducted preliminary experiments on public TREC datasets to
demonstrate the effectiveness and potential of MA4DIV. Considering the limited
number of queries in public TREC datasets, we construct a large-scale dataset
from industry sources and show that MA4DIV achieves substantial improvements in
both effectiveness and efficiency than existing baselines on a industrial scale
dataset.